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This  is  the  annotated  computer  output  for  the  three 
clustering  methods  considered  in  the  associated  technical 


report,  BU-921-M  and  '87-5,  Illustrative  Examples  of 
Clustering  using  the  Mixture  Method  and  Two  Comparable 
Methods  from  SAS,  by  K.E.  Basford,  W.T.  Federer,  and  N.J. 
Miles-McDermott.  The  computer  output  for  the  normal  mixture 
model  method  is  generated  from  a  fortran  program,  KMM, 
written  by  K.E.  Basford.  Two  other  clustering  methods  are 
considered  and  are  from  SAS/CLUSTER,  Version  5.  These  are 
Ward's  method  and  the  EML  method.  Two  real  data  sets  are 
processed. 


SQMMEEIS 

The  annotated  output  should  be  read  in  sequence  because 
explanations  made  on  earlier  pages  are  not  necessarily 
repeated  subsequently.  Some  pages  may  be  composites  of  more 
than  one  output  page,  and  some  output  pages  are  omitted 
because  they  are  generally  not  useful  for  the  purpose  at 
hand.  A  general  description  of  the  mixture  model  approach 
to  clustering  is  explained  in  detail  and  discussed  in 
relation  to  other  clustering  methods  in  Basford  (1986) .  SAS 
program  documentation  is  in  SAS  User's  Guide  (1985a  and  b) . 
Program  documentation  for  KMM  is  available  from  K.E. 


Basford  and  will  appear  in  a  forthcoming  book  by  McLachlan 
and  Basford  (1987). 

The  data  are  presented  below.  Following  on  pages  9-10 
the  KMM  and  SAS  control  language  for  each  example  is 
presented.  Control  language  is  given  in  capital  type  with 
accompanying  descriptions  and  notes  given  in  boldface  type. 
Program  output  follows  on  pages  12-44  with  annotations  in 
boldface  and  lower  case  type  that  describe  output  values  in 
some  detail. 


DATA  SETS 

Two  data  sets  are  used  for  each  of  the  three  clustering 
methods  presented.  The  first  data  set  was  taken  from 
Habbema,  Hermans,  and  van  den  Broek  (1974) .  These  examples 
are  labeled  CL-l-Habbema  through  CL-3-Habbema  on  the  output 
pages.  The  second  data  set  is  the  well  known  Iris  data 
published  by  Fisher  (1936).  These  examples  are  labeled  CL-l- 
Fisher  through  CL-3-Fisher  on  the  output  pages.  For  each 
data  set,  the  first  example  CL-1-  illustrates  the  normal 
mixture  model  method  of  clustering  using  the  KMM  program. 
CL-2-  illustrates  Ward's  method  using  SAS  and  CL-3- 
illustrates  the  EML  method  also  using  SAS. 

The  data  taken  from  Habbema  et  al.,  consists  of  45 


observations  on  known  haemophilia  A  carriers  and  35 


observations  on  known  noncarriers.  These  data  are  shown  in 
Table  1  and  contain  three  variables.  GROUP  indicates 
whether  the  individual  is  a  carrier  (coded  2)  or  noncarrier 
(coded  1) .  The  two  other  variables  are  used  to  discriminate 

between  the  normal  individuals  and  the  carriers  in  the 
clustering  programs  and  are  log10(AHF  activity)  and 

log1Q (AHF-like  antigen).  These  variables  were  named 

ACTIVITY  and  ANTIGEN,  respectively. 


TABLE  Is  Habbema  et  al.,  Haemophilia  Data 


GROUP 

ACTIVITY 

ANTIGEN 

1 

-0.00559 

-0.16571 

1 

-0.16980 

-0.15852 

1 

-0.34689 

-0.18791 

1 

-0.08944 

0.00642 

1 

-0.16791 

0.07129 

1 

-0.08362 

0.01059 

1 

-0.19789 

-0.00054 

1 

-0.07621 

0.03919 

1 

-0.19129 

-0.21229 

1 

-0.10919 

-0.11904 

1 

-0.52677 

-0.47734 

1 

-0.08419 

0.02482 

1 

-0.02252 

-0.05805 

1 

0.00841 

0.07821 

1 

-0.18266 

-0.11384 

1 

0.12366 

0.21397 

1 

-0.47022 

-0.30989 

1 

-0.15191 

-0.06864 

1 

0.00061 

-0.11531 

1 

-0.20154 

-0.04976 

1 

-0.19318 

-0.22933 

1 

0.15069 

0.09331 

1 

-0.12591 

-0.06686 

1 

-0.15508 

-0.12321 

1 

-0.19515 

-0.10067 

1 

0.02908 

0.04419 
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• 

1 

-0.22282 

-0.17099 

" 

1 

-0.09971 

-0.07333 

1 

-0.19724 

-0.06074 

1 

-0.08670 

-0.05597 

2 

-0.49859 

-0.08602 

2 

-0.50145 

-0.29844 

2 

-0.13259 

0.00970 

2 

-0.34787 

-0.17209 

2 

-0.37553 

-0.18652 

2 

-0.24466 

-0.04067 

2 

-0.22047 

0.00455 

2 

-0.21539 

-0.02191 

2 

-0.25404 

-0.05729 

2 

-0.37780 

-0.26816 

2 

-0.06391 

0.15694 

2 

-0.33510 

-0.13676 

2 

-0.01493 

0.15392 

2 

-0.03124 

0.14001 

2 

-0.17402 

-0.07764 

2 

-0.09636 

0.05307 

2 

-0.02344 

0.08038 

2 

-0.40546 

-0.24184 

2 

-0.34776 

0.11506 

1 

2 

-0.36180 

-0.20082 

[ 

2 

-0.69112 

-0.33899 

2 

-0.36083 

0.12372 

2 

-0.45348 

-0.16817 

% 

2 

-0.35388 

0.07219 

2 

-0.47186 

-0.10786 

2 

-0.36097 

-0.03994 

2 

-0.32261 

0.16697 

2 

-0.43193 

-0.06869 

2 

-0.27342 

-0.00203 

2 

-0.55728 

0.05480 

2 

-0.49503 

-0.01529 

2 

-0.51066 

-0.24825 

2 

-0.16516 

0.21321 

2 

-0.42318 

-0.09981 

2 

-0.23746 

0.28763 

2 

-0.34470 

0.00969 

2 

-0.40465 

-0.11618 

2 

-0.14158 

0.16416 

2 

-0.15082 

0.11372 

2 

-0.26421 

0.08669 

2 

-0.33525 

0.08753 

2 

-0.18782 

0.25096 

2 

-0.17443 

0.18924 

2 

-0.24443 

0.16137 

2 

-0.47837 

0.02821 

The  Fisher  Iris  data 

is  shown  in 

Table  2  and  consists 

of  four  measurements  on  50 

plants  from 

each  of  three  species 

5 

of  Iris:  Iris  setosa,  Iris  versicolor,  and  Iris  ulrgtnlca, 
These  species  were  coded  1,  2,  and  3,  respectively,  with  a 
variable  name  of  GROUP.  The  four  measurement  variables 
input  into  the  clustering  programs  were  sepal  length 
(SLENGTH) ,  sepal  width  (SWIDTH) ,  petal  length  (PLENGTH) ,  and 
petal  width  (PWIDTH) . 


GROUP 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 


TABLE  2:  Fisher  Iris  Data 


SLENGTH 

SWIDTH 

PLENGTH 

PWIDTH 

5.1 

3.5 

1.4 

0.3 

4.4 

3.2 

1.3 

0.2 

4.4 

3.0 

1.3 

0.2 

5.0 

3.5 

1.6 

0.6 

5.1 

3.8 

1.6 

0.2 

4.9 

3.1 

1.5 

0.2 

5.0 

3.2 

1.2 

0.2 

4.6 

3.2 

1.4 

0.2 

5.0 

3.3 

1.4 

0.2 

4.8 

3.4 

1.9 

0.2 

4.8 

3.0 

1.4 

0.1 

5.0 

3.5 

1.3 

0.3 

5.1 

3.3 

1.7 

0.5 

5.0 

3.4 

1.5 

0.2 

5.1 

3.8 

1.9 

0.4 

4.9 

3.0 

1.4 

0.2 

5.3 

3.7 

1.5 

0.2 

4.3 

3.0 

1.1 

0.1 

5.5 

3.5 

1.3 

0.2 

4.8 

3.4 

1.6 

0.2 

5.2 

3.4 

1.4 

0.2 

4.8 

3.1 

1.6 

0.2 

4.9 

3.6 

1.4 

0.1 

4.6 

3.1 

1.5 

0.2 

5.7 

4.4 

1.5 

0.4 

5.7 

3.8 

1.7 

0.3 

4.8 

3.0 

1.4 

0.3 

5.2 

4.1 

1.5 

0.1 

4.7 

3.2 

1.6 

0.2 

4.5 

2.3 

1.3 

0.3 

5.4 

3.4 

1.7 

0.2 

5.0 

3.0 

1.6 

0.2 

4.6 

3.4 

1.4 

0.3 

5.4 

3.9 

1.3 

0.4 
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6.0 

6.5 
5.7 
6.1 

5.5 

5.5 
5.4 

6.3 

5.2 

6.4 

6.6 

5.7 
6.1 
6.0 

6.3 

6.7 
7.2 

7.7 

7.2 

7.4 

7.6 

7.7 

6.2 

7.7 

6.8 

6.4 

5.7 

6.9 

5.9 
6.3 

5.8 

6.3 

6.0 

7.2 

6.2 

6.9 

6.7 

6.4 

5.8 
6.1 
6.0 

6.4 

5.8 

6.9 

6.7 

7.7 

6.3 

6.5 

7.9 

6.1 

6.4 
6.3 


4.5 

4.6 
4.5 

4.7 
4.0 

4.4 

4.5 

4.7 
3.9 

4.3 

4.4 

3.5 
4.0 
4.0 
6.0 

5.7 

6.1 

6.7 

5.8 

6.1 

6.6 

6.7 

5.4 

6.1 

5.5 

5.3 
5.0 
5.1 
5.1 

5.6 
5.1 

4.9 

4.8 

6.0 

4.8 

5.4 

5.6 

5.5 
5.1 

4.9 
5.0 

5.3 

5.1 

5.7 

5.2 

6.9 

5.1 

5.2 

6.4 

5.6 
5.6 
5.0 


;*V 
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4.9 

2.5 

4.5 

1.7 

6.8 

3.2 

5.9 

2.3 

7.1 

3.0 

5.9 

2.1 

6.7 

3.3 

5.7 

2.5 

6.3 

2.9 

5.6 

1.8 

6.5 

3.0 

5.5 

1.8 

6.5 

3.0 

5.8 

2.2 

7.3 

2.9 

6.3 

1.8 

6.7 

2.5 

5.8 

1.8 

5.6 

2.8 

4.9 

2.0 

6.4 

2.8 

5.6 

2.2 

6.5 

3.2 

5.1 

2.0 
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Control  Language 


CL-l-Habbema  (Mixture  method  from  KMM) 


75  2 

-0.005595  -0.165712 
-0.169805  -0.158521 


=>  75  is  the  number  of  observations  and  2  is 
the  number  of  variables 


} 


INPUT  DATA: 


ACTIVITY  and  ANTIGEN 


-0. 478366 
2 
2 
1 

112  11 
2  1111 
11111 
2  2  12  2 
12  111 
2  1112 
2  12  2  1 
12  112 


0.028215 

=>  number  of  clusters  to  be  formed 

instructs  KMM  to  assume  unequal  covariance  matrices 
=>  signals  KMM  that  initial  grouping  estimates  follow 
1  1  1  1  1 
112  11 
11111 

11112  \  Initial  groupings  of  observations 

11212  /  (results  of  Ward's  method  were  used) 

112  12 
12  111 


CL-2-Habbema  (Ward's  method 
DATA  GJ; 

INPUT  ACTIVITY  ANTIGEN; 

IF  _N_  LE  30  THEN  GR0UP=1; 
ELSE  GROUP=2; 

CARDS ; 

-0.005595  -0.165712 
-0.169805  -0.158521 


• 

-0.478366  0.028215 

PROC  CLUSTER  OUTTREE=TREE  METHOD=WARD;  Requests  CLUSTER  analysis 

using  Ward's  method  on  ACTIVITY  and  ANTIGEN 

VAR  ACTIVITY  ANTIGEN; 

COPY  GROUP; 

PROC  TREE  SORT  HEIGHT=N;  =>  Requests  the  Cluster  Tree  from  1  to  n 

(75)  clusters 

ID  GROUP; 

PROC  TREE  NCL=2  0UT=0UT  NOPRINT; 

ID  GROUP;  \  Causes  SAS  to  produce  2x2 

PROC  FREQ;  /  table  showing  misclassif ications 

TABLE  CLUSTER*GROUP; 


from  SAS) 

=>  Input  variables 
|  Defines  the  GROUP  variable 

=>  Signals  SAS  that  the  data  follow 


CL-3-Habbema  (EML  method  from  SAS) 


Same  control  language  as  for  2)  above  except  substitute  EML 
for  WARD  on  PROC  CLUSTER  line. 


CL- 1-Fisher  (Mixture  method  from  KMM) 


150  4 


5.1  3.5  1.4 

4.4  3.2  1.3 

• 

• 

6.5  3.2  5.1 
3 

1 

1 

111111 

111111 

• 

3  3  3  3  3  3 


=>  150  is  number  of  observations  and  4  is  the 
number  of  variables 

0.3 

0.2  ^  Input  data 

2.0 

=>  Number  of  clusters  to  be  formed 

=0  Instructs  KMM  to  assume  equal  covariance  matrices 
=>  Signals  KMM  that  initial  grouping  estimates  follow 
1111 

1111  \  Initial  grouping  of  observations 

/  (results  of  Ward's  method  were  used) 

3  2  3  3 


CL-2-Fisher  (Ward's  method  from  SAS) 

DATA  ONE; 

INPUT  SLENGTH  SWIDTH  PLENGTH  PWIDTH;  =>  Input  variables 
IF  _N_  LE  50  THEN  GR0UP=1;  \ 

ELSE  IF  _N_  LE  100  THEN  GR0UP=2 ;  /  Defines  the  GROUP  variable 

ELSE  GROUP=3 ; 

CARDS;  =>  Signals  SAS  that  the  data  follow 

5.1  3.5  1.4  0.3 
4.4  3.2  1.3  0.2 


6.5  3.2  5.1  2.0 

PROC  CLUSTER  OUTTREE=TREE  METHOD=WARD; 
VAR  SLENGTH  SWIDTH  PLENGTH  PWIDTH; 

COPY  GROUP; 


SORT  HEIGHT=N; 


} 

} 


PROC  TREE  DATA=TREE 
ID  GROUP; 

PROC  TREE  DATA=TREE  NCL=3  0UT=0UT  NO PRINT ; 
ID  GROUP; 

COPY  SLENGTH  SWIDTH  PLENGTH  PWIDTH; 

PROC  FREQ; 

TABLE  CLUSTER*GROUP ; 

PROC  CANDISC  NOPRINT  OUT=CAN; 

CLASS  CLUSTER; 

VAR  SLENGTH  SWIDTH  PLENGTH  PWIDTH; 

PROC  PLOT; 

PLOT  CAN2  *CAN1=CLUSTER ; 

PROC  PLOT; 

PLOT  CAN2  *CAN1=GR0UP ; 


Requests  the  Cluster  anlysis 
using  Ward's  method  on  the  4 
variables  SLENGTH,  SWIDTH, 
PLENGTH,  AND  PWIDTH 
Requests  cluster 
tree 


} 

} 


Requests  the  2x2  table 
showing  misclassifications 


This  series  of  commands  is 
used  to  display  cluster 
results.  The  CANDISC  pro¬ 
cedure  is  run  to  produce 
canonical  variables  for  the 
cluster  groups.  The  first 
2  canonical  variables  are 
then  plotted  to  show  cluster 
membership 
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CL-3 -Fisher  (EML  method  from  SAS) 


Same  control  language  as  for  2)  above  except  substitute  EML  for  WARD 
on  PROC  CLUSTER  line. 


CL-l-Habbema 


Initial  partition  as 
112  11 
2  1111 
11111 
2  2  12  2 

12  111 
2  1112 
2  12  2  1 

12  112 


specified  by  input 
11111 
112  11 
11111 
11112 
112  12 
112  12 
12  111 


Initial  group  allocation  for 
each  observation.  The  entry 
for  row  1  column  1  refers  to 
observation  1,  row  1  column  2 
refers  to  observation  2,  and 
so  on 


Estimated  mean 
ACTIVITY 
-0.221538 
-0.282643 


(as  a  row  vector)  for  each 
ANTIGEN 

-0.032402  -  GROUP  1 
-0.040757  *  GROUP  2 


group 

Group  means  for  each  variable 
based  on  initial  group 
allocation  above 


Estimated  covariance  matrix  for  group  1  -  S±  ^ (group  1} 
0.031661  *  S* 

0.010517  -  S12  0.019972  =  S* 


Covariance 

matrices  for 
each  group  based 
on  initial  group 
allocation 


Estimated  covariance  matrix  for  group  2  *  sij (group  2) 
0.022859  =  S* 

0.016834  =  S12  0.030533  =  S2 


Proportion  from  each  group  as  specified  by  input  *  Number  initially 

0.720  0.280  assigned  to  group  i/total  number 

of  observations 


In  loop  55  log  likelihood  is  77.035  77.035  is  the  solution  of  the 

likelihood  equation  based  on  55 
iterations  of  the  EM  algorithm 
Estimate  of  mixing  proportion  for  each  group 

0.508  0.492  Estimate  of  the  final  proportion 

for  each  group  under  the  normal 
mixture  model 

Entity:  Final  estimates  of  posterior  probabilities  of  group  membership 


RVATION 

GROUP-1 

GROUP-2 

1 

0.999 

0.001 

2 

0.971 

0.029 

3 

0.245 

0.755 

4 

0.985 

0.015 

5 

0.710 

0.290 

6 

0.986 

0.014 

7 

0.773 

0.227 

8 

0.983 

0.017 

9 

0.958 

0.042 

10 

0.992 

0.008 

11 

0.001 

0.999 

12 

0.983 

0.017 

13 

0.999 

0.001 

14 

0.997 

0.003 

15 

0.948 

0.052 

These  estimates  indicate  the  degree  of 
certainty  with  which  each  observation 
belongs  to  one  of  the  two  groups 

For  example,  observation  1  has  a 
probability  of  .999  of  belonging  to 
group  1  and  .001  of  belonging  to  group 
2 


CL- 1 -Habbema 


16 

0.998 

0.002 

17 

0.014 

0.986 

18 

0.966 

0.034 

19 

0.999 

0.001 

20 

0.855 

0.145 

21 

0.957 

0.043 

22 

1.000 

0.000 

23 

0.982 

0.018 

24 

0.976 

0.024 

25 

0.922 

0.078 

26 

0.999 

0.001 

27 

0.899 

0.101 

28 

0.992 

0.008 

29 

0.882 

0.118 

30 

0.993 

0.007 

31 

0.003 

0.997 

32 

0.185 

0.815 

33 

0.001 

0.999 

34 

0.006 

0.994 

35 

0.949 

0.051 

36 

0.000 

1.000 

37 

0.002 

0.998 

38 

0.012 

0.988 

39 

0.223 

0.777 

40 

0.008 

0.992 

41 

0.004 

0.996 

42 

0.045 

0.955 

43 

0.002 

0.998 

44 

0.008 

0.992 

45 

0.274 

0.726 

46 

0.000 

1.000 

13 


47 

0.125 

0.875 

48 

0.000 

1.000 

49 

0.004 

0.996 

50 

0.092 

0.908 

|  51 

0.606 

0.394 

52 

0.015 

0.985 

53 

0.001 

0.999 

54 

0.620 

0.380 

55 

0.736 

0.264 

56 

0.034 

0.966 

57 

0.591 

0.409 

58 

0.152 

0.848 

59 

0.032 

0.968 

60 

0.899 

0.101 

61 

0.241 

0.759 

62 

0.975 

0.025 

63 

0.970 

0.030 

64 

0.944 

0.056 

65 

0.426 

0.574 

66 

0.636 

0.364 

67 

0.963 

0.037 

68 

0.089 

0.911 

69 

0.992 

0.008 

70 

0.010 

0.990 

71 

0.016 

0.984 

72 

0.126 

0.874 

73 

0.073 

0.927 

- 

1  74 

0.031 

0.969 

1  75 

0.000 

1.000 

1  Resulting  partition  of 

the  entities  into  NG  groups 

Final  group 

1 

1 

2 

1  1 

1 

1  1 

1 

i 

allocations  after 

2 

1 

1 

1  1 

1 

2  1 

1 

i 

55  iterations  of 

1 

1 

1 

1  1 

1 

1  1 

1 

i 

clustering  algorithm 

2 

2 

2 

2  1 

2 

2  2 

2 

2 

2 

2 

2 

2  2 

2 

2  2 

2 

2 

1 

2 

2 

1  1 

2 

1  2 

2 

1 

2 

1 

1 

1  2 

1 

1  2 

1 

2 

2 

2 

2 

2  2 

Number 

assigned  to  each  group 

39 

36 

|  Estimates 

of  correct  allocation  rates 

for  each  group 

i  0.934 

0.908 

Overall  estimate 

of  degree  of  certainty 

1 

with  which  observations  are  allocated  j 

to 

each  group 

1 

14 

ibibib 

BBBNttflB 

1 

m 

■ 

' 

,  Estimate  of  overall 

correct  allocation  rate  0.921  Weighted  average 

of  estimates  of  correct 
allocation  rates  for  each 

/ 

V 

group 

Estimated  mean  (as 

a  row  vector)  for 

each 

group 

V 

ACTIVITY 

ANTIGEN 

-0.115406  -0 

.024497 

=  GROUP  1 

Group  means  for  each  variable 

-0.365950  -0 

.045323 

=  GROUP  2 

based  on  final  estimates  of 

,  , 

' 

posterior  probability  of  group 
membership 

Estimated  covariance  matrix  for  group 

1  “  Sii (group  1) 

0.011245 

0.006548 

0.012367 

\  Based  on  final 

• 

/  estimates  of 

V 

V» 

posterior  prob¬ 
ability  of  group 
membership 

■ 

Estimated  covariance  matrix  for  group 

2  *  Sii fqroup  2) 

0.015898 

;.v 

i 

0.015029 

0.032278 

’* 

CL- 1-FISHER 

Initial  partition  as  specified  by  input 

1111 

1  1 

111 

1 

* 

i 

1111 

1  1 

111 

1 

1111 

1  1 

111 

1 

Initial  group  allocation  for 

1111 

1  1 

111 

1 

each  observation.  The  entry 

1111 

1  1 

111 

1 

for  row  1  column  1  refers  to 

2  2  2  2 

2  2 

2  2  2 

2 

observation  1,  row  1  column  2 

2  2  3  2 

2  2 

2  2  2 

2 

refers  to  observation  2,  and 

4 

■  i  N 

2  2  2  2 

2  2 

2  2  2 

2 

so  on 

2  2  2  2 

2  2 

2  2  2 

2 

2  2  2  2 

2  2 

2  2  2 

2 

3  3  3  3 

3  3 

3  3  3 

3 

3  3  2  3 

2  3 

2  2  2 

3 

'  t 

2  3  3  3 

2  2 

2  3  2 

3 

* 

3  3  2  3 

3  2 

3  2  2 

3 

« 

,!n 

3  3  3  3 

3  3 

3  2  3 

3 

Estimated  mean  (as 

a  row  vector)  for 

each 

group  Group  means  for 

- 

S LENGTH  S WIDTH 

PLENGTH 

PWIDTH  each  variable 

V 

5.005994  3.427995 

1.461996 

0.246000  GROUP  1  based  on  initial 

5.920269  2.751557 

4.420300 

1.434370  GROUP  2  group  allocation 

‘/l 

y» 

,  * 

6.869439  3.086106 

5.769438 

2.105549  GROUP  3  above 

15 
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Estimated  covariance  matrix  for  group 


1  =  S 


0.124213 

0.099176 

0.016347 

0.010327 


0.143674 

0.011713 

0.009296 


0.030165 

0.006070 


Estimated  covariance  matrix  for  group 


0.227175 

0.066786 

0.141501 

0.034401 


0.087267 

0.053037 

0.028532 


0.277231 

0.117393 


Estimated  covariance  matrix  for  group 


0.241609 

0.016371 

0.185024 

-0.008398 


ij (group  1) 


0.011106 


’ij (group  2) 


0.085792 


’ij  (group  3) 


0.082387 

0.011265 

0.027246 


0.230741 

0.009312 


0.059419 


Covariance 
matrices  for 
each  group 

based  on 
initial  group 
allocation 


Estimated  common  covariance  matrix 


In  this  run  we 


0.196290 

0.065579 

0.110146 

0.016186 


0.104907 

0.029316 

0.021814 


0.183807 

0.054552 


specified  that  KMM 

}  assume  equal  cov¬ 
ariance  matrices 
0.054618  for  each  group 
This  is  the  pooled  estimate  of  that 
matrix  based  on  the  weighted 
average  of  the  individual  estimated 
covariance  matrices 

CL-l-FISHER 

Proportion  from  each  group  as  specified  by  input 

0.333  0.427  0.240  .333  =  50/150  =  Number  initially 

assigned  to  group  1/total  number 
of  observations 

In  loop  30  log  likelihood  is  -256.354  *  Solution  to  the  likelihood 

equation  based  on  30  iter¬ 
ations  of  the  EM  algorithm 

Estimate  of  mixing  proportion  for  each  group 

0.333  0.330  0.337  Estimate  of  the  final  propor¬ 

tion  for  each  group  under  the 
normal  mixture  model 
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Entity:  Final  estimates  of  posterior  probabilities  of  group  membership 


> 


SRVATION 

GROUP  1 

GROUP  2 

GROUP  3 

1 

1.000 

0.000 

0.000 

2 

1.000 

0.000 

0.000 

3 

1.000 

0.000 

0.000 

4 

1.000 

0.000 

0.000 

5 

1.000 

0.000 

0.000 

6 

1.000 

0.000 

0.000 

7 

1.000 

0.000 

0.000 

8 

1.000 

0.000 

0.000 

9 

1.000 

0.000 

0.000 

10 

1.000 

0.000 

0.000 

11 

1.000 

0.000 

0.000 

12 

1.000 

0.000 

0.000 

13 

1.000 

0.000 

0.000 

14 

1.000 

0.000 

0.000 

15 

1.000 

0.000 

0.000 

16 

1.000 

0.000 

0.000 

17 

1.000 

0.000 

0.000 

18 

1.000 

0.000 

0.000 

19 

1.000 

0.000 

0.000 

20 

1.000 

0.000 

0.000 

21 

1.000 

0.000 

0.000 

22 

1.000 

0.000 

0.000 

23 

1.000 

0.000 

0.000 

24 

1.000 

0.000 

0.000 

25 

1.000 

0.000 

0.000 

26 

1.000 

0.000 

0.000 

27 

1.000 

0.000 

0.000 

28 

1.000 

0.000 

0.000 

29 

1.000 

0.000 

0.000 

30 

1.000 

0.000 

0.000 

31 

1.000 

0.000 

0.000 

32 

1.000 

0.000 

0.000 

33 

1.000 

0.000 

0.000 

34 

1.000 

0.000 

0.000 

35 

1.000 

0.000 

0.000 

36 

1.000 

0.000 

0.000 

37 

1.000 

0.000 

0.000 

38 

1.000 

0.000 

0.000 

39 

1.000 

0.000 

0.000 

40 

1.000 

0.000 

0.000 

41 

1.000 

0.000 

0.000 

42 

1.000 

0.000 

0.000 

43 

1.000 

0.000 

0.000 

44 

1.000 

0.000 

0.000 

45 

1.000 

0.000 

0.000 

46 

1.000 

0.000 

0.000 

47 

1.000 

0.000 

0.000 

48 

1.000 

0.000 

0.000 

These  estimates  indicate  the 
degree  of  certainty  with 
which  each  observation 
belongs  to  one  of  the  three 
groups.  Observation  1  has  a 
probability  of  1.0  of 
belonging  to  group  1  and  0 
of  belonging  to  the  other 
two  groups 


17 


.» * 
r 

,v, 

t 


*  » 

i‘i  * 


49 

1.000 

0.000 

1 

0.000 

50 

1.000 

0.000 

0.000 

51 

0.000 

0.999 

0.001 

52 

0.000 

1.000 

0.000 

53 

0.000 

1.000 

0.000 

54 

0.000 

1.000 

0.000 

55 

0.000 

1.000 

0.000 

56 

0.000 

1.000 

0.000 

57 

0.000 

0.999 

0.001 

58 

0.000 

1.000 

0.000 

59 

0.000 

1.000 

0.000 

60 

0.000 

1.000 

0.000 

61 

0.000 

1.000 

0.000 

62 

0.000 

1.000 

0.000 

63 

0.000 

0.704 

0.296 

64 

0.000 

1.000 

0.000 

65 

0.000 

1.000 

0.000 

66 

0.000 

1.000 

0.000 

67 

0.000 

1.000 

0.000 

68 

0.000 

1.000 

0.000 

69 

0.000 

1.000 

0.000 

70 

0.000 

0.997 

0.003 

71 

0.000 

1.000 

0.000 

72 

0.000 

0.967 

0.033 

73 

0.000 

1.000 

0.000 

74 

0.000 

1.000 

0.000 

75 

0.000 

1.000 

0.000 

76 

0.000 

0.998 

0.002 

77 

0.000 

0.999 

0.001 

78 

0.000 

0.127 

0.873 

79 

0.000 

1.000 

0.000 

80 

0.000 

0.999 

0.001 

81 

0.000 

0.979 

0.021 

82 

0.000 

0.133 

0.867 

83 

0.000 

0.868 

0.132 

84 

0.000 

0.991 

0.009 

85 

0.000 

1.000 

0.000 

86 

0.000 

1.000 

0.000 

87 

0.000 

0.988 

0.012 

88 

0.000 

0.997 

0.003 

89 

0.000 

0.998 

0.002 

90 

0.000 

0.994 

0.006 

91 

0.000 

1.000 

0.000 

92 

0.000 

0.999 

0.001 

93 

0.000 

0.929 

0.071 

94 

0.000 

0.979 

0.021 

95 

0.000 

0.999 

0.001 

96 

0.000 

1.000 

0.000 

97 

0.000 

1.000 

0.000 

98 

0.000 

1.000 

0.000 

99 

0.000 

1.000 

0.000 

18 

100 

0.000 

1.000 

0.000 

101 

0.000 

0.000 

1.000 

102 

0.000 

0.000 

1.000 

103 

0.000 

0.000 

1.000 

104 

0.000 

0.000 

1.000 

105 

0.000 

0.148 

0.852 

106 

0.000 

0.000 

1.000 

107 

0.000 

0.000 

1.000 

108 

0.000 

0.000 

1.000 

109 

0.000 

0.000 

1.000 

110 

0.000 

0.000 

1.000 

111 

0.000 

0.000 

1.000 

112 

0.000 

0.002 

0.998 

113 

0.000 

0.000 

1.000 

114 

0.000 

0.000 

1.000 

115 

0.000 

0.009 

0.991 

116 

0.000 

0.000 

1.000 

117 

0.000 

0.001 

0.999 

118 

0.000 

0.094 

0.906 

119 

0.000 

0.123 

0.877 

120 

0.000 

0.003 

0.997 

121 

0.000 

0.162 

0.838 

122 

0.000 

0.001 

0.999 

123 

0.000 

0.000 

1.000 

124 

0.000 

0.004 

0.996 

125 

0.000 

0.001 

0.999 

126 

0.000 

0.089 

0.911 

127 

0.000 

0.302 

0.698 

128 

0.000 

0.000 

1.000 

129 

0.000 

0.000 

1.000 

130 

0.000 

0.000 

1.000 

131 

0.000 

0.000 

1.000 

132 

0.000 

0.000 

1.000 

133 

0.000 

0.746 

0.254 

134 

0.000 

0.002 

0.998 

135 

0.000 

0.000 

1.000 

136 

0.000 

0.073 

0.927 

137 

0.000 

0.000 

1.000 

138 

0.000 

0.006 

0.994 

139 

0.000 

0.022 

0.978 

140 

0.000 

0.000 

1.000 

141 

0.000 

0.000 

1.000 

142 

0.000 

0.000 

1.000 

143 

0.000 

0.001 

0.999 

144 

0.000 

0.005 

0.995 

145 

0.000 

0.000 

1.000 

146 

0.000 

0.000 

1.000 

147 

0.000 

0.000 

1.000 

148 

0.000 

0.000 

1.000 

149 

0.000 

0.000 

1.000 

150 

0.000 

0.008 

0.992 

CL— 1— FISHER 


Resulting  partition  of  the  entities  into  NG  groups 


11111 
11111 
11111 
11111 
11111 
2  2  2  2  2 

2  2  2  2  2 

2  2  2  2  2 

2  3  2  2  2 

2  2  2  2  2 

3  3  3  3  3 

3  3  3  3  3 

3  3  3  3  3 

3  3  2  3  3 

3  3  3  3  3 


11111 
11111 
11111 
11111 
11111 
2  2  2  2  2 

2  2  2  2  2 

2  2  3  2  2 

2  2  2  2  2 

2  2  2  2  2 

3  3  3  3  3 

3  3  3  3  3 

3  3  3  3  3 

3  3  3  3  3 

3  3  3  3  3 


Final  group  allocations  after 
30  iterations 


Number  assigned  to  each  group 
50  49  51 


Estimates  of  correct  allocation  rates  for  each  group  Overall  estimate 
1.000  0.973  0.983  of  degree  of  certainty 

with  which  observations 
are  allocated  to  each 
group 

Estimate  of  overall  correct  allocation  rate  0.985  =  Weighted  average  of 

estimates  of  correct 
allocation  rates  for 
each  group 

Estimated  mean  (as  a  row  vector)  for  each  group 


SLENGTH 

SWIDTH 

PLENGTH 

FWIDTH 

Group  means  for 

5.005994 

3.427995 

1.461996 

0.246000 

each  variable  based 

5.942309 

2.760773 

4.258801 

1.319220 

on  estimates  of 

6.574652 

2.980818 

5.539058 

2.024963 

posterior  proba- 

bility  of  group  membership 


Estimated  common 
0.263932 
0.089847 
0.169658 
0.039336 


covariance  matrix 
0.111946 

0.051118  0.186544 

0.029976  0.041973 


This  pooled  estimate  of  the 
common  covariance  matrix  is 
based  on  the  final  estimates 
of  posterior  probability  of 
group  membership 
0.039709 
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EIGENVALUES  OF  THE  COVARIANCE  MATRIX 


v 

i  < 

-x< 


1 


A",.*  I 


•I 


■\< 


i  U’i 

V.j 


ooooooooooooooooooooooooooo'odo'o'o'ooooo 


Si  S  9  S  S  18  8  8  P  S  8  8  a  5  3  *2  ! 


:£S3SS88$?288!: 


8i^8B96SB 

*H  M  H  ^  H  ^  «-*  »H  N 

888888888 


OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO 


(nNNMvnwMCMNinnnNMrtV'Cn^nNOi’nuii'iNniOBnniDacMV 


SSBB§3888iSS1888i38S861SI8li8iSiSSSB; 


O  ^  N  H  ( 


SsgidsiiiSSiigi 


18189381988 


Y« 


'  •*' 


<V»"j 

4  I 

. V  < 

v.«; 


•  *» 

;.v 


i  •,'1 

.V 

*1 


m 

4 


1 


|i*iil§SffgS|!g*Pi§| 

Cj95(DNSQQVWSOHiA^ii)^c5ji55 

SiSSSaiSSjSoioiffiaSoooS^Povo 

ddddodddddddoddodddd 


dddddddddddddddddodd 


t-in(o<NcDc^ojinmcDO)C7)nT<ocs  — 


3§dS§id3; 


:  cvj  o _ co  *-« 


ddSddSS 


IS99SiS3iS993ali9333 


QO>OOt'-<OlO^<*>CJ^OO>aON<Dlfl*,rOCJ^ 


,i'.'.tv|!,pvwv 


EIGENVALUES  OF  THE  COVARIANCE  MATRIX 
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PLOT  OF  CAN2*CAN1  SYMBOL  IS  VALUE  OF 


13  OBS  HIDDEN 

This  plot  displays  the  results  when  three  clusters  are  formed.  The  first  two  canonical  variables  (CAN1  and  CAN2)  for  discriminating  anong  the 
three  clusters  were  computed  and  plotted  to  show  cluster  mcafcership.  The  symbol  plotted  is  the  value  of  CLUSTER. 


PLOT  OF  CAN&CAN1  SYMBOL  IS  VALUE  OF  GROUP 


13  OBS  HIDDEN 

Tbis  plot  is  exactly  the  sane  as  the  one  on  the  previous  page  except  the  symbol  plotted  Is  the  value  of  CROUP. 
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EQUAL  VARIANCE  MAXIMUM  LIKELIHOOD 
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